8 research outputs found
Computing the Stereo Matching Cost with a Convolutional Neural Network
We present a method for extracting depth information from a rectified image
pair. We train a convolutional neural network to predict how well two image
patches match and use it to compute the stereo matching cost. The cost is
refined by cross-based cost aggregation and semiglobal matching, followed by a
left-right consistency check to eliminate errors in the occluded regions. Our
stereo method achieves an error rate of 2.61 % on the KITTI stereo dataset and
is currently (August 2014) the top performing method on this dataset.Comment: Conference on Computer Vision and Pattern Recognition (CVPR), June
201
Training deep neural networks for stereo vision
We present a method for extracting depth information from a rectified image
pair. Our approach focuses on the first stage of many stereo algorithms: the
matching cost computation. We approach the problem by learning a similarity
measure on small image patches using a convolutional neural network. Training
is carried out in a supervised manner by constructing a binary classification
data set with examples of similar and dissimilar pairs of patches.
We examine two network architectures for learning a similarity measure on image
patches. The first architecture is faster than the second, but produces
disparity maps that are slightly less accurate. In both cases, the input to the
network is a pair of small image patches and the output is a measure of
similarity between them. Both architectures contain a trainable feature
extractor that represents each image patch with a feature vector. The
similarity between patches is measured on the feature vectors instead of the
raw image intensity values. The fast architecture uses a fixed similarity
measure to compare the two feature vectors, while the accurate architecture
attempts to learn a good similarity measure on feature vectors.
The output of the convolutional neural network is used to initialize the stereo
matching cost. A series of post-processing steps follow: cross-based cost
aggregation, semiglobal matching, a left-right consistency check, subpixel
enhancement, a median filter, and a bilateral filter.
We evaluate our method on the KITTI 2012, KITTI 2015, and Middlebury stereo
data sets and show that it outperforms other approaches on all three data sets
Training deep neural networks for stereo vision
V pričujoči doktorski disertaciji predstavimo metodo za izračun cene ujemanja
za problem stereo vida. Stereo podatkovne množice, na primer KITTI in
Middlebury, so v zadnjih nekaj letih postale dovolj velike, da se lahko
problema lotimo z metodami, ki temeljijo na učenju. Naš pristop temelji na
uporabi globoke konvolucijske nevronske mreže in algoritma za nadzorovano
strojno učenje. Učno množico zgradimo iz javno dostopnih stereo podatkovnih
množic. Učni primer sestoji iz para slikovnih zaplat in pripada enemu izmed
dveh razredov: pozitivnemu, ko sta slikovni zaplati v korespondenci in
negativnemu, ko nista.
Predstavljeni sta dve arhitekturi konvolucijskih nevronskih mrež za učenje
podobnosti. Prva arhitektura je hitrejša od druge, vendar je izračunana
globinska slika v povprečju manj natančna. V obeh primerih je vhod v nevronsko
mrežo par slikovnih zaplat, izhod pa mera podobnosti med njima. Obe arhitekturi
vsebujeta konvolucijski nevronski mreži, ki slikovni zaplati predstavita z
vektorjem značilk. Podobnost med slikovnima zaplatama je izračunana na vektorju
značilk, namesto na svetlostih posameznih slikovnih elementov. Prva arhitektura
vektorja značilk primerja s kosinusno podobnostjo, medtem ko druga arhitektura
vektorja primerja z naučeno večnivojsko nevronsko mrežo.
Razvito metodo primerjamo z uveljavljenimi metodami na treh podatkovnih
množicah -- KITTI 2012, KITTI 2015 in Middlebury -- in ugotovimo, da je naša
metoda najnatančnejša na vse treh podatkovnih množicah.We present a method for extracting depth information from a rectified image
pair. Our approach focuses on the first stage of many stereo algorithms: the
matching cost computation. We approach the problem by learning a similarity
measure on small image patches using a convolutional neural network. Training
is carried out in a supervised manner by constructing a binary classification
data set with examples of similar and dissimilar pairs of patches.
We examine two network architectures for learning a similarity measure on image
patches. The first architecture is faster than the second, but produces
disparity maps that are slightly less accurate. In both cases, the input to the
network is a pair of small image patches and the output is a measure of
similarity between them. Both architectures contain a trainable feature
extractor that represents each image patch with a feature vector. The
similarity between patches is measured on the feature vectors instead of the
raw image intensity values. The fast architecture uses a fixed similarity
measure to compare the two feature vectors, while the accurate architecture
attempts to learn a good similarity measure on feature vectors.
The output of the convolutional neural network is used to initialize the stereo
matching cost. A series of post-processing steps follow: cross-based cost
aggregation, semiglobal matching, a left-right consistency check, subpixel
enhancement, a median filter, and a bilateral filter.
We evaluate our method on the KITTI 2012, KITTI 2015, and Middlebury stereo
data sets and show that it outperforms other approaches on all three data sets
Training deep neural networks for stereo vision
We present a method for extracting depth information from a rectified image
pair. Our approach focuses on the first stage of many stereo algorithms: the
matching cost computation. We approach the problem by learning a similarity
measure on small image patches using a convolutional neural network. Training
is carried out in a supervised manner by constructing a binary classification
data set with examples of similar and dissimilar pairs of patches.
We examine two network architectures for learning a similarity measure on image
patches. The first architecture is faster than the second, but produces
disparity maps that are slightly less accurate. In both cases, the input to the
network is a pair of small image patches and the output is a measure of
similarity between them. Both architectures contain a trainable feature
extractor that represents each image patch with a feature vector. The
similarity between patches is measured on the feature vectors instead of the
raw image intensity values. The fast architecture uses a fixed similarity
measure to compare the two feature vectors, while the accurate architecture
attempts to learn a good similarity measure on feature vectors.
The output of the convolutional neural network is used to initialize the stereo
matching cost. A series of post-processing steps follow: cross-based cost
aggregation, semiglobal matching, a left-right consistency check, subpixel
enhancement, a median filter, and a bilateral filter.
We evaluate our method on the KITTI 2012, KITTI 2015, and Middlebury stereo
data sets and show that it outperforms other approaches on all three data sets
Linker and Loader for the HIP Processor
In this work a collection of programs, called hiputils is presented. The toolchain constitutes an assembler, linker, dynamic loader, simulator and a static library creation utility for the HIP processor. A precise description of the process of creating, linking and loading of static and dynamic libraries in hiputils is given. A format for object files,
static and dynamic libraries is also defined.
Beside hiputils, linking and loading of programs and libraries is also described. Several object file formats, including COM, a.out and ELF are studied and compared. The three main tasks of linkers: storage allocation,
symbol management and relocation are detailed. A description of libraries - static as well as dynamic - is also given, along with a description of dynamic loading and relocation. A mechanism, which allows code to run at an arbitrary start address is also depicted (position independent code)
Orange: data mining toolbox in Python
Orange is a machine learning and data mining suite for data analysis through Python scripting and visual programming. Here we report on the scripting part, which features interactive data analysis and component-based assembly of data mining procedures. In the selection and design of components, we focus on the flexibility of their reuse: our principal intention is to let the user write simple and clear scripts in Python, which build upon C++ implementations of computationally-intensive tasks. Orange is intended both for experienced users and programmers, as well as for students of data mining